Multidimensional Scaling (MDS) — “Recreating a map using only pairwise distances”#
Multidimensional Scaling (MDS) is a family of techniques for turning a matrix of pairwise distances / dissimilarities into coordinates in a low-dimensional space.
If you’ve ever seen a road-distance table and wondered “can I draw a map from this?”, you already understand the core idea.
What you’ll learn#
the intuition: map-making from distances (and why the solution is only defined up to rotation/translation/reflection)
the math: distance matrices, the stress objective, and what changes in classical vs metric vs non-metric MDS
how classical MDS becomes an eigendecomposition problem (closed form)
how metric MDS can be solved by iterative stress minimization (SMACOF)
how MDS compares to PCA and Isomap
from dataclasses import dataclass
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
from scipy.stats import spearmanr
from sklearn.decomposition import PCA
from sklearn.datasets import make_swiss_roll
from sklearn.manifold import Isomap, MDS
from sklearn.preprocessing import StandardScaler
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
Prerequisites#
Euclidean distance and what a distance matrix represents
Basic linear algebra: eigenvalues/eigenvectors, matrix multiplication
Helpful (not required): intuition for PCA
1) Intuition: recreating a map from pairwise distances#
Imagine you have no coordinates for a set of cities.
All you have is a table like:
distance from City A to City B
distance from City A to City C
…and so on for every pair
MDS tries to place points on a 2D plane so that the distances on the plane match the distances in the table.
Important subtlety: even with perfect distances, the “map” is not unique.
You can translate (shift) the whole map.
You can rotate it.
You can mirror it.
All of those transformations preserve pairwise distances.
# Helpers (NumPy-first)
def pairwise_euclidean_distances(X: np.ndarray) -> np.ndarray:
"""Compute the full pairwise Euclidean distance matrix for X.
Args:
X: (n, d) array
Returns:
D: (n, n) distance matrix where D[i, j] = ||x_i - x_j||_2
"""
X = np.asarray(X, dtype=float)
if X.ndim != 2:
raise ValueError("X must be a 2D array of shape (n, d)")
sq_norms = np.sum(X**2, axis=1)
D2 = sq_norms[:, None] + sq_norms[None, :] - 2.0 * (X @ X.T)
D2 = np.maximum(D2, 0.0)
return np.sqrt(D2)
def validate_distance_matrix(D: np.ndarray, *, tol: float = 1e-10) -> np.ndarray:
"""Validate a distance/dissimilarity matrix.
Checks shape, finiteness, symmetry, and (approximately) zero diagonal.
"""
D = np.asarray(D, dtype=float)
if D.ndim != 2 or D.shape[0] != D.shape[1]:
raise ValueError("D must be a square matrix")
if not np.all(np.isfinite(D)):
raise ValueError("D must contain only finite values")
if np.min(D) < -tol:
raise ValueError("D must be non-negative (up to numerical tolerance)")
if not np.allclose(np.diag(D), 0.0, atol=tol, rtol=0.0):
raise ValueError("D must have a (near) zero diagonal")
if not np.allclose(D, D.T, atol=tol, rtol=0.0):
raise ValueError("D must be symmetric")
return D
def procrustes_align(reference: np.ndarray, target: np.ndarray, *, allow_scaling: bool = True) -> np.ndarray:
"""Align `target` to `reference` using an optimal orthogonal transform.
This is only for plotting/visual comparison. Pairwise distances are invariant to
translation/rotation/reflection, so different MDS solutions can look "different"
while still being correct.
"""
ref = np.asarray(reference, dtype=float)
tgt = np.asarray(target, dtype=float)
if ref.ndim != 2 or tgt.ndim != 2:
raise ValueError("reference and target must be 2D arrays")
if ref.shape != tgt.shape:
raise ValueError("reference and target must have the same shape")
ref0 = ref - ref.mean(axis=0, keepdims=True)
tgt0 = tgt - tgt.mean(axis=0, keepdims=True)
# Solve: min_R || (tgt0 @ R) - ref0 ||_F, s.t. R is orthogonal
U, s, Vt = np.linalg.svd(tgt0.T @ ref0, full_matrices=False)
R = U @ Vt
scale = (s.sum() / (np.sum(tgt0**2) + 1e-12)) if allow_scaling else 1.0
return scale * tgt0 @ R
# Synthetic "cities" on a plane
n_cities = 18
cities_true = rng.uniform(0, 10, size=(n_cities, 2))
city_labels = [f"C{i+1}" for i in range(n_cities)]
# The only input MDS needs: pairwise distances
D = validate_distance_matrix(pairwise_euclidean_distances(cities_true))
# Reconstruct coordinates from the distance matrix
mds_sklearn = MDS(
n_components=2,
metric=True,
dissimilarity="precomputed",
random_state=42,
n_init=8,
max_iter=500,
)
cities_mds = mds_sklearn.fit_transform(D)
cities_mds_aligned = procrustes_align(cities_true, cities_mds, allow_scaling=True)
mds_sklearn.stress_
0.03431239684685934
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=(
"True map (unknown to MDS)",
"Reconstructed using only distances (metric MDS)",
),
)
fig.add_trace(
go.Scatter(
x=cities_true[:, 0],
y=cities_true[:, 1],
mode="markers+text",
text=city_labels,
textposition="top center",
name="true",
),
row=1,
col=1,
)
fig.add_trace(
go.Scatter(
x=cities_mds_aligned[:, 0],
y=cities_mds_aligned[:, 1],
mode="markers+text",
text=city_labels,
textposition="top center",
name="mds",
),
row=1,
col=2,
)
for c in [1, 2]:
fig.update_xaxes(scaleanchor=f"y{'' if c == 1 else c}", scaleratio=1, row=1, col=c)
fig.update_layout(
title="MDS as map-making: reconstructing coordinates from distances",
height=450,
showlegend=False,
)
fig.show()
2) Mathematical explanation#
Distance (dissimilarity) matrix#
We start with an \(n \times n\) matrix \(\Delta\) where each entry is a dissimilarity:
If the dissimilarities come from Euclidean distances between unknown points \(x_i\), then:
Embedding distances#
We want points \(y_i \in \mathbb{R}^p\) (often \(p=2\)) so the induced distances
match the input as well as possible.
Stress function#
A standard objective is raw stress (with optional weights \(w_{ij} \ge 0\)):
Minimizing stress is exactly the “move the points until the distances match” story.
Classical vs metric vs non-metric MDS#
Variant |
What you try to preserve |
Typical solver |
Notes |
|---|---|---|---|
Classical MDS (Torgerson–Gower) |
Distances assuming they are Euclidean |
Eigen decomposition (closed form) |
Very fast; closely related to PCA |
Metric MDS |
Distances (in the metric sense) |
Iterative stress minimization (e.g., SMACOF) |
Handles general dissimilarities; may have local minima |
Non-metric MDS |
Rank order of dissimilarities |
Stress minimization + monotone regression |
Great when “only order matters” (ratings, human judgments) |
A useful mental model:
Metric MDS cares about how far.
Non-metric MDS cares about which is farther.
3) Optimization process#
3.1 Classical MDS: eigen decomposition (closed form)#
If \(\Delta\) is a matrix of Euclidean distances, we can recover an (approximate) inner-product matrix.
Let \(J = I - \frac{1}{n}\mathbf{1}\mathbf{1}^\top\) be the centering matrix and let \(\Delta^{\circ 2}\) be elementwise-squared distances.
Define the double-centered matrix:
When the distances are truly Euclidean, \(B\) is a Gram matrix \(B = YY^\top\).
So we eigendecompose:
using the top \(p\) positive eigenvalues.
Connection to PCA:
If \(\Delta\) comes from Euclidean distances of data points, classical MDS and PCA produce the same coordinates (up to rotation).
def classical_mds(D: np.ndarray, n_components: int = 2) -> tuple[np.ndarray, np.ndarray]:
"""Classical MDS (Torgerson-Gower) from a distance matrix.
Classical MDS assumes distances are (approximately) Euclidean.
Returns:
Y: (n, n_components) embedding
eigvals: eigenvalues of the centered Gram matrix (sorted descending)
"""
D = validate_distance_matrix(D)
n = D.shape[0]
if not (1 <= n_components <= n):
raise ValueError("n_components must be between 1 and n")
J = np.eye(n) - np.ones((n, n)) / n
B = -0.5 * J @ (D**2) @ J
eigvals, eigvecs = np.linalg.eigh(B)
order = np.argsort(eigvals)[::-1]
eigvals = eigvals[order]
eigvecs = eigvecs[:, order]
# Keep only the leading components; clamp tiny negatives due to numerical noise
L = np.diag(np.sqrt(np.maximum(eigvals[:n_components], 0.0)))
Y = eigvecs[:, :n_components] @ L
return Y, eigvals
cities_classical, eigvals_cities = classical_mds(D, n_components=2)
cities_classical_aligned = procrustes_align(cities_true, cities_classical, allow_scaling=True)
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=(
"True map",
"Classical MDS (aligned)",
),
)
fig.add_trace(
go.Scatter(
x=cities_true[:, 0],
y=cities_true[:, 1],
mode="markers+text",
text=city_labels,
textposition="top center",
name="true",
),
row=1,
col=1,
)
fig.add_trace(
go.Scatter(
x=cities_classical_aligned[:, 0],
y=cities_classical_aligned[:, 1],
mode="markers+text",
text=city_labels,
textposition="top center",
name="classical",
),
row=1,
col=2,
)
for c in [1, 2]:
fig.update_xaxes(scaleanchor=f"y{'' if c == 1 else c}", scaleratio=1, row=1, col=c)
fig.update_layout(
title="Classical MDS: eigendecomposition reconstruction",
height=450,
showlegend=False,
)
fig.show()
# Eigenvalues of B tell you how many meaningful dimensions the distances support.
# Negative eigenvalues often indicate the distances are not perfectly Euclidean (or there is noise).
fig = px.line(
x=np.arange(1, len(eigvals_cities) + 1),
y=eigvals_cities,
markers=True,
title="Classical MDS: eigenvalues of the centered Gram matrix",
labels={"x": "eigenvalue index", "y": "eigenvalue"},
)
fig.add_hline(y=0, line_dash="dash", line_color="black")
fig.show()
3.2 Metric MDS: stress minimization (SMACOF)#
Metric MDS directly minimizes stress. A common workhorse algorithm is SMACOF (Scaling by MAjorizing a COmplicated Function).
At a high level:
Start with a guess \(Y^{(0)}\).
Repeat:
compute the current distances \(d_{ij}(Y^{(t)})\)
build a matrix \(B(Y^{(t)})\) that depends on \(\Delta_{ij} / d_{ij}(Y^{(t)})\)
update \(Y^{(t+1)} = \frac{1}{n} B(Y^{(t)}) Y^{(t)}\)
For the raw-stress metric case (all weights 1), SMACOF is popular because it monotonically decreases stress.
def raw_stress(D: np.ndarray, Y: np.ndarray) -> float:
"""Raw stress: sum_{i<j} (||y_i - y_j|| - D_ij)^2."""
D = validate_distance_matrix(D)
Y = np.asarray(Y, dtype=float)
if Y.ndim != 2 or Y.shape[0] != D.shape[0]:
raise ValueError("Y must have shape (n, p) matching D")
dist = pairwise_euclidean_distances(Y)
tri = np.triu_indices(D.shape[0], 1)
return float(np.sum((dist[tri] - D[tri]) ** 2))
def smacof_metric(
D: np.ndarray,
n_components: int = 2,
max_iter: int = 300,
tol: float = 1e-7,
random_state: int = 42,
) -> tuple[np.ndarray, np.ndarray]:
"""A small, from-scratch SMACOF implementation for metric MDS (raw stress).
The update has the form:
Y <- (1/n) B(Y) Y
For raw stress with uniform weights:
B_ij = -D_ij / d_ij(Y) for i != j
B_ii = -sum_{j != i} B_ij
Returns:
Y: embedding (n, n_components)
stress_history: array of raw-stress values per iteration
"""
D = validate_distance_matrix(D)
n = D.shape[0]
if not (1 <= n_components <= n):
raise ValueError("n_components must be between 1 and n")
if max_iter <= 0:
raise ValueError("max_iter must be > 0")
if tol <= 0:
raise ValueError("tol must be > 0")
tri = np.triu_indices(n, 1)
rng_local = np.random.default_rng(random_state)
Y = rng_local.normal(scale=1.0, size=(n, n_components))
Y -= Y.mean(axis=0, keepdims=True)
stress_history: list[float] = []
eps = 1e-12
for _ in range(max_iter):
dist = pairwise_euclidean_distances(Y)
dist = np.maximum(dist, eps)
ratio = D / dist
B = -ratio
np.fill_diagonal(B, 0.0)
np.fill_diagonal(B, -B.sum(axis=1))
Y_new = (1.0 / n) * (B @ Y)
Y_new -= Y_new.mean(axis=0, keepdims=True)
stress = float(np.sum((pairwise_euclidean_distances(Y_new)[tri] - D[tri]) ** 2))
stress_history.append(stress)
if len(stress_history) >= 2:
prev = stress_history[-2]
if (prev - stress) / (prev + eps) < tol:
Y = Y_new
break
Y = Y_new
return Y, np.asarray(stress_history)
@dataclass
class ScratchMetricMDS:
"""Metric MDS solved with SMACOF (learning-oriented)."""
n_components: int = 2
max_iter: int = 300
tol: float = 1e-7
random_state: int = 42
embedding_: np.ndarray | None = None
stress_history_: np.ndarray | None = None
def fit_transform(self, D: np.ndarray) -> np.ndarray:
self.embedding_, self.stress_history_ = smacof_metric(
D,
n_components=self.n_components,
max_iter=self.max_iter,
tol=self.tol,
random_state=self.random_state,
)
return self.embedding_
scratch_mds = ScratchMetricMDS(n_components=2, max_iter=300, tol=1e-8, random_state=42)
cities_smacof = scratch_mds.fit_transform(D)
stress_hist = scratch_mds.stress_history_
cities_smacof_aligned = procrustes_align(cities_true, cities_smacof, allow_scaling=True)
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=(
"Metric MDS (SMACOF) embedding (aligned)",
"Stress vs iterations",
),
)
fig.add_trace(
go.Scatter(
x=cities_smacof_aligned[:, 0],
y=cities_smacof_aligned[:, 1],
mode="markers+text",
text=city_labels,
textposition="top center",
name="smacof",
),
row=1,
col=1,
)
fig.add_trace(
go.Scatter(
x=np.arange(1, len(stress_hist) + 1),
y=stress_hist,
mode="lines+markers",
name="stress",
),
row=1,
col=2,
)
fig.update_xaxes(scaleanchor="y", scaleratio=1, row=1, col=1)
fig.update_xaxes(title_text="iteration", row=1, col=2)
fig.update_yaxes(title_text="raw stress", row=1, col=2)
fig.update_layout(
title="Metric MDS via SMACOF: embedding + optimization trace",
height=450,
showlegend=False,
)
fig.show()
# Quick sanity check (like in the supervised notebooks): scratch vs sklearn
comparison = (
pd.DataFrame(
[
{"method": "Classical MDS (eig)", "raw stress": raw_stress(D, cities_classical)},
{"method": "Metric MDS (SMACOF scratch)", "raw stress": raw_stress(D, cities_smacof)},
{"method": "Metric MDS (sklearn)", "raw stress": raw_stress(D, cities_mds)},
]
)
.sort_values("raw stress")
.reset_index(drop=True)
)
comparison
| method | raw stress | |
|---|---|---|
| 0 | Classical MDS (eig) | 1.226099e-27 |
| 1 | Metric MDS (SMACOF scratch) | 1.194457e-20 |
| 2 | Metric MDS (sklearn) | 1.563523e-02 |
4) Plotly diagnostics: distance preservation errors#
Perfect Euclidean distances are a best case.
In practice, your dissimilarities might come from:
measurement noise
road distances (not straight-line Euclidean)
human similarity ratings
So the distances may not be exactly representable in 2D.
Below we add a small amount of symmetric noise to the distance matrix and compare:
Classical MDS (eigendecomposition)
Metric MDS (our SMACOF optimizer)
Metric MDS (scikit-learn, as a reference implementation)
# Add symmetric noise to the distance matrix (simulate imperfect measurements)
noise_level = 0.08
noise = rng.normal(loc=0.0, scale=noise_level, size=D.shape)
D_noisy = D * (1.0 + noise)
D_noisy = np.maximum(D_noisy, 0.0)
D_noisy = 0.5 * (D_noisy + D_noisy.T)
np.fill_diagonal(D_noisy, 0.0)
cities_classical_noisy, eigvals_noisy = classical_mds(D_noisy, n_components=2)
cities_smacof_noisy, stress_hist_noisy = smacof_metric(
D_noisy,
n_components=2,
max_iter=500,
tol=1e-8,
random_state=42,
)
mds_sklearn_noisy = MDS(
n_components=2,
metric=True,
dissimilarity="precomputed",
random_state=42,
n_init=8,
max_iter=500,
)
cities_mds_noisy = mds_sklearn_noisy.fit_transform(D_noisy)
methods_noisy = {
"Classical MDS": cities_classical_noisy,
"Metric MDS (SMACOF)": cities_smacof_noisy,
"Metric MDS (sklearn)": cities_mds_noisy,
}
def pairwise_distance_errors(D_target: np.ndarray, Y: np.ndarray) -> dict[str, np.ndarray | float]:
D_target = validate_distance_matrix(D_target)
Y = np.asarray(Y, dtype=float)
D_emb = pairwise_euclidean_distances(Y)
n = D_target.shape[0]
tri = np.triu_indices(n, 1)
d0 = D_target[tri]
d1 = D_emb[tri]
rel = (d1 - d0) / (d0 + 1e-12)
abs_rel = np.abs(rel)
return {
"abs_rel_error": abs_rel,
"spearman_r": float(spearmanr(d0, d1).statistic),
}
rows = []
summary = []
for name, Y in methods_noisy.items():
stats = pairwise_distance_errors(D_noisy, Y)
rows.append(pd.DataFrame({"method": name, "abs_rel_error": stats["abs_rel_error"]}))
summary.append(
{
"method": name,
"mean(|Δd|/d)": float(np.mean(stats["abs_rel_error"])),
"median(|Δd|/d)": float(np.median(stats["abs_rel_error"])),
"spearman_r": float(stats["spearman_r"]),
}
)
df_err = pd.concat(rows, ignore_index=True)
df_summary = pd.DataFrame(summary).sort_values("mean(|Δd|/d)")
df_summary
| method | mean(|Δd|/d) | median(|Δd|/d) | spearman_r | |
|---|---|---|---|---|
| 2 | Metric MDS (sklearn) | 0.045843 | 0.038304 | 0.990990 |
| 1 | Metric MDS (SMACOF) | 0.045848 | 0.038141 | 0.990970 |
| 0 | Classical MDS | 0.068116 | 0.049934 | 0.986239 |
fig = px.histogram(
df_err,
x="abs_rel_error",
color="method",
nbins=35,
barmode="overlay",
opacity=0.6,
marginal="box",
title="Distance preservation errors on a noisy distance matrix",
labels={"abs_rel_error": "|d_emb - d_target| / d_target"},
)
fig.update_layout(height=450)
fig.show()
5) Comparison: MDS vs PCA vs Isomap#
All three produce low-dimensional embeddings, but they optimize different things.
Method |
Input you start from |
What it tries to preserve |
When it shines |
|---|---|---|---|
PCA |
feature matrix \(X\) |
variance in a linear subspace |
fast baseline, denoising, roughly-linear structure |
(Classical/Metric) MDS |
distance matrix \(\Delta\) |
pairwise distances (or ranks of distances) |
when “distances come first” (similarity tables, human judgments, graph distances) |
Isomap |
feature matrix \(X\) |
geodesic distances on a manifold |
nonlinear “unrolling” when a manifold model is reasonable |
Rules of thumb:
Use PCA when a linear subspace is a good approximation.
Use MDS when you trust your dissimilarities (or only trust their order, for non-metric MDS).
Use Isomap when local neighborhoods are meaningful and you want to preserve them globally.
Fun fact: Isomap is essentially classical MDS applied to a geodesic distance matrix.
# A classic nonlinear dataset: the swiss roll
X3, t = make_swiss_roll(n_samples=450, noise=0.05, random_state=42)
X3 = StandardScaler().fit_transform(X3)
# PCA (linear)
Y_pca = PCA(n_components=2, random_state=42).fit_transform(X3)
# Classical MDS on Euclidean distances in 3D
D_euclid = validate_distance_matrix(pairwise_euclidean_distances(X3))
Y_mds, _ = classical_mds(D_euclid, n_components=2)
# Isomap (geodesic distances)
iso = Isomap(n_neighbors=10, n_components=2)
Y_iso = iso.fit_transform(X3)
D_geo = iso.dist_matrix_
df_embed = pd.concat(
[
pd.DataFrame({"dim1": Y_pca[:, 0], "dim2": Y_pca[:, 1], "method": "PCA", "t": t}),
pd.DataFrame({"dim1": Y_mds[:, 0], "dim2": Y_mds[:, 1], "method": "Classical MDS", "t": t}),
pd.DataFrame({"dim1": Y_iso[:, 0], "dim2": Y_iso[:, 1], "method": "Isomap", "t": t}),
],
ignore_index=True,
)
fig = px.scatter(
df_embed,
x="dim1",
y="dim2",
color="t",
facet_col="method",
facet_col_spacing=0.06,
title="Swiss roll: PCA vs MDS vs Isomap",
labels={"t": "roll parameter"},
height=450,
)
fig.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
fig.update_xaxes(matches=None)
fig.update_yaxes(matches=None)
fig.show()
def relative_rmse(D_target: np.ndarray, Y: np.ndarray) -> float:
D_target = validate_distance_matrix(D_target)
Y = np.asarray(Y, dtype=float)
D_emb = pairwise_euclidean_distances(Y)
n = D_target.shape[0]
tri = np.triu_indices(n, 1)
d0 = D_target[tri]
d1 = D_emb[tri]
rel = (d1 - d0) / (d0 + 1e-12)
return float(np.sqrt(np.mean(rel**2)))
metrics = pd.DataFrame(
[
{
"method": "PCA (vs Euclidean)",
"relative RMSE": relative_rmse(D_euclid, Y_pca),
},
{
"method": "Classical MDS (vs Euclidean)",
"relative RMSE": relative_rmse(D_euclid, Y_mds),
},
{
"method": "Isomap (vs geodesic)",
"relative RMSE": relative_rmse(D_geo, Y_iso),
},
]
)
metrics
| method | relative RMSE | |
|---|---|---|
| 0 | PCA (vs Euclidean) | 0.282550 |
| 1 | Classical MDS (vs Euclidean) | 0.282550 |
| 2 | Isomap (vs geodesic) | 0.317378 |
Pitfalls & diagnostics#
Local minima (metric/non-metric MDS): different initializations can land in different solutions; try multiple
n_init.Non-Euclidean dissimilarities: classical MDS may show negative eigenvalues; metric/non-metric MDS is often safer.
Scaling matters: if dissimilarities are on wildly different scales, stress values are hard to interpret; consider normalization.
Missing distances: real problems may have unknown entries; you’ll need weights/masks (not covered here).
Exercises#
Add noise to the distance matrix \(\Delta\) and observe what happens to (a) eigenvalues in classical MDS and (b) the stress curve in metric MDS.
Implement a weighted stress function where some pairs have \(w_{ij}=0\) (missing distances).
Try non-metric MDS on data where distances are monotone-transformed (e.g., square all distances) and compare Spearman correlation.
On swiss roll, change
n_neighborsin Isomap and see when it fails (too small = disconnected graph, too large = geodesics become Euclidean).
References#
scikit-learn docs:
sklearn.manifold.MDS,sklearn.manifold.IsomapBorg & Groenen — Modern Multidimensional Scaling
Torgerson (1952), Gower (1966) — classical MDS foundations